library(tidyverse)
library(readr)
library(ggplot2)
library(summarytools)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Final Project Assignment#1: Cam Needels
Part 1. Introduction
This data set is trending YouTube videos from August 11th, 2020 until April 10th, 2023 in the United States. It’s collected from the top trending YouTube video list provided by YouTube itself. This is created in order to find the top-trending videos that drive the most traffic to the website. Since YouTube has so much data from all over the world and because different videos trend in different countries, this data set is specifically the US data. Each case represents a video that is trending on a certain date (this means the same video can be present multiple times because it can trend for multiple dates).
Which topics have the most amount of videos trending? Which variables strongly relate to higher view count? Which channels trended the most? Which videos were the most viewed in this time period?
Part 2. Describe the data set(s)
As stated above this data set takes the trending YouTube videos from August 11th, 2020 until April 10th, 2023 in the US. It contains 195390 trending videos on different dates (with some duplicates). There are 36572 unique videos in this data set. This data set has 16 columns including the view count, title, publish date, channel name, category id (instead of the name it makes each category a specific number, this actually makes it easier for my statistical analysis section), youtube channel id, likes, dislikes, comment count, the date the video was trending, the tags on the YouTube video, the thumbnail link, if comments were disabled or not, if ratings were disabled or not, and description. This is a large amount of data and I’m considering getting rid of duplicates to make it more manageable but I haven’t decided if that’s the route I want to go. I’m interested in mostly the comment_count, dislikes, likes, and view_count as those are the most important variables for YouTube videos.
YouTube <- read.csv("B:/Needels/Documents/DACCS 601/DACSS_601_New/posts/CamNeedels_FinalProjectData/US_youtube_trending_data.csv")
YouTube